Fast Follower Recovery for State Machine Replication
نویسندگان
چکیده
The method of state machine replication, adopting a single strong Leader, has been widely used in the modern cluster-based database systems. In practical applications, the recovery speed has a significant impact on the availability of the systems. However, in order to guarantee the data consistency, the existing Follower recovery protocols in Paxos replication (e.g., Raft) need multiple network trips or extra data transmission, which may increase the recovery time. In this paper, we propose the Follower Recovery using Special mark log entry (FRS) algorithm. FRS is more robust and resilient to Follower failure and it only needs one network round trip to fetch the least number of log entries. This approach is implemented in the open source database system OceanBase. We experimentally show that the system adopting FRS has a good performance in terms of recovery time.
منابع مشابه
Protocol-Aware Recovery for Consensus-Based Storage
We introduce protocol-aware recovery (PAR), a new approach that exploits protocol-specific knowledge to correctly recover from storage faults in distributed systems. We demonstrate the efficacy of PAR through the design and implementation of corruption-tolerant replication (CTRL), a PAR mechanism specific to replicated state machine (RSM) systems. We experimentally show that the CTRL versions o...
متن کاملFast Log Replication in Highly Available Data Store
Modern large-scale data stores widely adopt consensus protocols to achieve high availability and throughput. The recently proposed Raft algorithm has better understandability and widely implemented in large amount of open source projects. In these consensus algorithms including Raft, log replication is a common and frequently used operation which has significant impact on the system performance...
متن کاملVirtually Synchronous Methodology for Dynamic Service Replication
In designing and building distributed systems, it is common engineering practice to separate steady-state (“normal”) operation from abnormal events such as recovery from failure. This way the normal case can be optimized extensively while recovery can be amortized. However, integrating the recovery procedure with the steady-state protocol is often far from obvious, and can present subtle diffic...
متن کاملTapping TCP Streams
Providing transparent replication of servers has been a major goal in the fault tolerance community. Transparent replication is particularly challenging for highly nondeterministic applications, such as the ones that use multithreading. For such applications, keeping replicas in a consistent state becomes non-trivial. One way to deal with the non-determinism is to use a leader/follower approach...
متن کاملCheckpointing in Parallel State-Machine Replication
State-machine replication is a popular approach to building fault-tolerant systems, which relies on the sequential execution of commands to guarantee strong consistency. Sequential execution, however, threatens performance. Recently, several proposals have suggested parallelizing the execution model of the replicas to enhance state-machine replication’s performance. Despite their success in acc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017